20 research outputs found
A Bit-Parallel Deterministic Stochastic Multiplier
This paper presents a novel bit-parallel deterministic stochastic multiplier,
which improves the area-energy-latency product by up to 10.610,
while improving the computational error by 32.2\%, compared to three prior
stochastic multipliers.Comment: To Appear at IEEE ISQED 202
Photonic Reconfigurable Accelerators for Efficient Inference of CNNs with Mixed-Sized Tensors
Photonic Microring Resonator (MRR) based hardware accelerators have been
shown to provide disruptive speedup and energy-efficiency improvements for
processing deep Convolutional Neural Networks (CNNs). However, previous
MRR-based CNN accelerators fail to provide efficient adaptability for CNNs with
mixed-sized tensors. One example of such CNNs is depthwise separable CNNs.
Performing inferences of CNNs with mixed-sized tensors on such inflexible
accelerators often leads to low hardware utilization, which diminishes the
achievable performance and energy efficiency from the accelerators. In this
paper, we present a novel way of introducing reconfigurability in the MRR-based
CNN accelerators, to enable dynamic maximization of the size compatibility
between the accelerator hardware components and the CNN tensors that are
processed using the hardware components. We classify the state-of-the-art
MRR-based CNN accelerators from prior works into two categories, based on the
layout and relative placements of the utilized hardware components in the
accelerators. We then use our method to introduce reconfigurability in
accelerators from these two classes, to consequently improve their parallelism,
the flexibility of efficiently mapping tensors of different sizes, speed, and
overall energy efficiency. We evaluate our reconfigurable accelerators against
three prior works for the area proportionate outlook (equal hardware area for
all accelerators). Our evaluation for the inference of four modern CNNs
indicates that our designed reconfigurable CNN accelerators provide
improvements of up to 1.8x in Frames-Per-Second (FPS) and up to 1.5x in FPS/W,
compared to an MRR-based accelerator from prior work.Comment: Paper accepted at CASES (ESWEEK) 202
A Silicon Nitride Microring Based High-Speed, Tuning-Efficient, Electro-Refractive Modulator
The use of the Silicon-on-Insulator (SOI) platform has been prominent for
realizing CMOS-compatible, high-performance photonic integrated circuits
(PICs). But in recent years, the silicon-nitride-on-silicon-dioxide
(SiN-on-SiO) platform has garnered increasing interest as an alternative to
the SOI platform for realizing high-performance PICs. This is because of its
several beneficial properties over the SOI platform, such as low optical
losses, high thermo-optic stability, broader wavelength transparency range, and
high tolerance to fabrication-process variations. However, SiN-on-SiO based
active devices such as modulators are scarce and lack in desired performance,
due to the absence of free-carrier based activity in the SiN material and the
complexity of integrating other active materials with SiN-on-SiO platform.
This shortcoming hinders the SiN-on-SiO platform for realizing active PICs.
To address this shortcoming, we demonstrate a SiN-on-SiO microring
resonator (MRR) based active modulator in this article. Our designed MRR
modulator employs an Indium-Tin-Oxide (ITO)-SiN-ITO thin-film stack, in which
the ITO thin films act as the upper and lower claddings of the SiN MRR. The
ITO-SiN-ITO thin-film stack leverages the free-carrier assisted, high-amplitude
refractive index change in the ITO films to effect a large electro-refractive
optical modulation in the device. Based on the electrostatic, transient, and
finite difference time domain (FDTD) simulations, conducted using photonics
foundry-validated tools, we show that our modulator achieves 280 pm/V resonance
modulation efficiency, 67.8 GHz 3-dB modulation bandwidth, 19 nm
free-spectral range (FSR), 0.23 dB insertion loss, and 10.31 dB
extinction ratio for optical on-off-keying (OOK) modulation at 30 Gb/s
AGNI: In-Situ, Iso-Latency Stochastic-to-Binary Number Conversion for In-DRAM Deep Learning
Recent years have seen a rapid increase in research activity in the field of
DRAM-based Processing-In-Memory (PIM) accelerators, where the analog computing
capability of DRAM is employed by minimally changing the inherent structure of
DRAM peripherals to accelerate various data-centric applications. Several
DRAM-based PIM accelerators for Convolutional Neural Networks (CNNs) have also
been reported. Among these, the accelerators leveraging in-DRAM stochastic
arithmetic have shown manifold improvements in processing latency and
throughput, due to the ability of stochastic arithmetic to convert
multiplications into simple bit-wise logical AND operations. However,the use of
in-DRAM stochastic arithmetic for CNN acceleration requires frequent stochastic
to binary number conversions. For that, prior works employ full adder-based or
serial counter based in-DRAM circuits. These circuits consume large area and
incur long latency. Their in-DRAM implementations also require heavy
modifications in DRAM peripherals, which significantly diminishes the benefits
of using stochastic arithmetic in these accelerators. To address these
shortcomings, this paper presents a new substrate for in-DRAM
stochastic-to-binary number conversion called AGNI. AGNI makes minor
modifications in DRAM peripherals using pass transistors, capacitors, encoders,
and charge pumps, and re-purposes the sense amplifiers as voltage comparators,
to enable in-situ binary conversion of input statistic operands of different
sizes with iso latency.Comment: (Preprint) To Appear at ISQED 202
SCONNA: A Stochastic Computing Based Optical Accelerator for Ultra-Fast, Energy-Efficient Inference of Integer-Quantized CNNs
The acceleration of a CNN inference task uses convolution operations that are
typically transformed into vector-dot-product (VDP) operations. Several
photonic microring resonators (MRRs) based hardware architectures have been
proposed to accelerate integer-quantized CNNs with remarkably higher throughput
and energy efficiency compared to their electronic counterparts. However, the
existing photonic MRR-based analog accelerators exhibit a very strong trade-off
between the achievable input/weight precision and VDP operation size, which
severely restricts their achievable VDP operation size for the quantized
input/weight precision of 4 bits and higher. The restricted VDP operation size
ultimately suppresses computing throughput to severely diminish the achievable
performance benefits. To address this shortcoming, we for the first time
present a merger of stochastic computing and MRR-based CNN accelerators. To
leverage the innate precision flexibility of stochastic computing, we invent an
MRR-based optical stochastic multiplier (OSM). We employ multiple OSMs in a
cascaded manner using dense wavelength division multiplexing, to forge a novel
Stochastic Computing based Optical Neural Network Accelerator (SCONNA). SCONNA
achieves significantly high throughput and energy efficiency for accelerating
inferences of high-precision quantized CNNs. Our evaluation for the inference
of four modern CNNs at 8-bit input/weight precision indicates that SCONNA
provides improvements of up to 66.5x, 90x, and 91x in frames-per-second (FPS),
FPS/W and FPS/W/mm2, respectively, on average over two photonic MRR-based
analog CNN accelerators from prior work, with Top-1 accuracy drop of only up to
0.4% for large CNNs and up to 1.5% for small CNNs. We developed a
transaction-level, event-driven python-based simulator for the evaluation of
SCONNA and other accelerators (https://github.com/uky-UCAT/SC_ONN_SIM.git).Comment: To Appear at IPDPS 202